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Abstract — We consider a class of small-sample distribution 
estimators over noisy channels. Our estimators are designed for 
repetition channels, and rely on properties of the runs of the 
observed sequences. These runs are modeled via a special type 
of Markov chains, termed alternating Markov chains. We show 
that alternating chains have redundancy that scales sub-linearly 
with the lengths of the sequences, and describe how to use a 
distribution estimator for alternating chains for the purpose of 
distribution estimation over repetition channels. 

I. Introduction 

The problem of estimating the distribution of a source with 
a large alphabet, based on a small number of observations, 
is of significant interest in molecular biology, neuroscience, 
physics, statistics, and learning theory. In order to address 
this problem, throughout the years a number of sophisticated 
classes of estimators were developed by Good and Turing (TJ, 
and Orlitsky et al 0, 0, to cite a few. The idea behind 
these estimators is to use frequencies of symbol frequencies, 
rather than simple frequency counts standardly used for Max- 
imum Likelihood (ML) estimation. 

An additional problem in this estimation setting arises when 
some of the observations are inaccurate. Since most known 
distribution estimators are based on frequency counts, errors 
that change these counts may have a significant bearing on 
the accuracy of the method. A particularly interesting case 
is when the counts are changed by consecutive repetitions of 
some symbols. 

In 0, we described a collection of distribution estimators, 
based on Expectation Maximization (EM), both for channels 
with known and channels with unknown repetition parameters. 
The focal point of the study was a class of sequences, termed 
alternating sequences, generated by a Markov chain of special 
topology. The goal of this work is twofold. The first goal is 
to establish a rigorous analytical framework for evaluating the 
redundancy of alternating Markov chains. The second goal 
is to describe how to use alternating sequence distribution 
estimators for distribution estimation over repetition channels. 
In particular, we exhibit block and sequential estimators for 
alternating sequences that have vanishing redundancy and 
provide for accurate estimation in the presence of repetitions. 



It is important to observe that although alternating se- 
quences are generated by a Markov chain, their redundancy 
cannot be accurately computed using the methods developed 
in 0. This is due to the fact that the bound of [6| are 
too general to give useful redundancy characterizations for 
special classes of Markov chains. Surprisingly, the special 
class of alternating Markov sequences has properties that 
may be analyzed using tools developed for i.i.d. sequences, 
with some appropriate modification. Our analysis reveals that 
alternating Markov chains have sub-linear pattern redundancy 
in the sequence length n, scaling as cy/n + log(n), for some 
constant c. This is a counterpart of the examples described 
in (6), where Markov chains of redundancy of order nlogn 
were constructed using permutation patterns. 

The paper is structured as follows. In Section [TTJ we 
introduce the problem of distribution estimation over noisy 
channels and the notion of alternating sequences. In Section 
[TTT1 we review the basic ideas and terminology behind the 
proposed estimation method, including the notions of patterns 
and profiles. Sections llV-AI and IIV-BI are devoted to deriving 
upper and lower bounds on the redundancy of block estimators 
for alternating sequences, respectively. A sequential estimator 
for alternating sequences is presented in Section [V] Section [VTl 
describes how to determine the source distribution, observed 
through a noisy channel, based on the estimated probabilities 
of alternating sequences. 

II. Small-Sample Distribution Estimation 

Consider a sample sequence x generated by an i.i.d. source 
S defined over a large-cardinality alphabet A. Suppose that 
the source S has distribution p$- The estimator observes an 
erroneous version of x, denoted by y. The errors are modeled 
as arising from a channel C with input x and output y. What are 
the ultimate performance limits for estimating the distribution 
of the source, given that the length of x is small compared to 
|.4| or comparable to \A\7 

Estimating the distribution of a source based on a noise-free 
short sequence x is a challenging task. ML estimators perform 
poorly in this setting as they typically overestimate probabili- 
ties of seen symbols, while they underestimate probabilities of 



unseen symbols. More appropriate solutions for this scenario 
are due to Good and Turing (TJ, and Orlitsky et al 0, 0. 

There are two approaches one can follow to address an even 
more difficult family of problems, namely that of small-sample 
distribution estimation in the presence of errors. 

In one scenario, one may try to first denoise the output 
sequence y, so as to obtain an estimate x of x, and then 
apply a small-sample distribution estimator to x. Note that x 
depends on both y and ps, and thus one needs to estimate ps 
in order to estimate x and vice versa. This "estimation loop" 
may be resolved via the use of iterative methods that alternate 
in improving estimates for x and p$ [5|. In another scenario, 
one may try to first estimate the distribution of y and then 
reconstruct the distribution of x by "inversion" of the noisy 
channel. Here, we pursue the second line of reasoning. 

The focal point of our inversion study is a special class 
of Markov sequences that arise in the study of distribution 
estimation over repetition channels. A repetition channel is a 
channel which outputs several copies of each input symbol. 
The number of copies is a random variable with a prede- 
termined distribution of known or unknown paramters. One 
important property of repetition channels is that they maintain 
the identity and order of symbols in the sequence, and only 
alter the symbols' runlengths. As an example, the sequence 
x =' 'committee' passed through a repetition channel may be 
observed as y — 'ccommmiitttee' . The alternating sequence 
of a sequence x, V(x), is a sequence obtained from x by 
replacing each run of x by one single symbol. We refer to 
V(x) = V(y) as the alternating sequence and denote it by 
v. Note that v is a Markov sequence and its corresponding 
Markov chain is referred to an alternating Markov chain. 

Throughout the rest of the paper, we reserve the symbol 
N to denote the length of the source output x. We also use 
m to denote the cardinality of the alphabet A, which may be 
infinite, and n to denote the length of the alternating sequence 
v. 

III. Patterns, Profiles, and Technical 
Preliminaries 

The pattern i/j = (x) of a sequence x is obtained by 
replacing each symbol by its order of appearance in x. The 
profile of -0 of a pattern is a vector Q(ip) — (ipi, ■ ■ ■ , (p n ), 
where ipi is the number of symbols that appear i times in 
ip. We use the shorthand notation Q(x) to denote $(^(5;)). 
Notational confusion can be avoided by noting whether the 
argument of $(■) is a sequence or a pattern. 

For example, the profile of the pattern ip — 1232421 is 
(p = (2,1,1, 0, 0, 0, 0), since 3 and 4 appear once, 1 appears 
twice, and 2 appears three times. Note that many patterns may 
have the same profile. 

It is clear that ip is the pattern of an alternating sequence v 
of length n if and only if ipi ^ ipi+i, for 1 < i < n — 1. As 
will be seen later, a necessary and sufficient condition for Jp 
to be the profile of some alternating sequence is that ipi = 
for i > \n/2\. 



Remark 1. Observe that every profile of patterns of length n 
corresponds to a partition of the integer n. The number of 
parts of size i is (pi and ^™ iipi = n. In the example above, 
Tp = (2,1,1,0,0,0,0) corresponds to an (unordered) partition 
of 7 into two parts of size 1, one part of size 2, and one part 
of size 3, i.e., 7=1+1+2+3. In the correspondence between 
partitions and profiles, the number of parts of size /i equals 
the number of symbols that appear /i times. The number of 
partitions of n is denoted by p (n). 

Let I n be the collection of i.i.d. distributions over length n 
sequences. Consider a probability distribution p over A that 
assigns probability 0<p s <ltos€./4 Then, the distribution 
induced by p over A n is denoted by p n and is defined in such 
a way that, for all x € A n , 

n 

P n (a?) II/' 

Note that p n G l n . 

Every distribution p over A also induces a distribution px^ , 

pxn :=p n ({x:^(x)=^}), 

over patterns of length n. The set of all such induced distri- 
butions is denoted by X'^ . We simply write p ix) and p (^t) if 
dropping the subscripts and superscripts causes no confusion. 

Furthermore, p induces a probability distribution over alter- 
nating sequences of length n, which for v — v\ ■ ■ ■ v n takes 
the form 

n 

PVn(v) = Pvi ]l ' ■ (1) 

The set of all such induced distributions is denoted by V". The 
induced distribution pvj over alternating patterns of length n 
and the set are defined similarly to their unconstrained 
sequence counterparts. Note that V" is Markovian. 

A. Properties of Alternating Sequences 

The first issue we address is the relationship between the 
length of the source sequence and the length of the corre- 
sponding alternating sequence. The following lemma will be 
useful in our subsequent discussion. 

Lemma 2. Assuming all symbol probabilities are smaller than 
!/2, we have 

P[n < N/2 - V8N In N] < — , (2) 

so that with high probability, n > N/2 — y/8N\nN. 

Proof: For an i.i.d. sequence x — x\ ■ ■ ■ xm, let R (x) 
denote the number of runs of x. Let rij , < j < N, be a 
martingale sequence defined as 

rij — E[R (x) \x\ ■ ■ ■ xj] . 

This exposure martingale if obtained by revealing the values 
of Xi, 1 < i < N, one by one - i.e., the filtration exposes the 
values of Xi, 1 < i < N, sequentially. 



Note that no = E[R (x)] and n = tin — R (x). If x\ and 
X2 differ in only one symbol, then \R (x±) — R (3?2)| < 2. 
This implies J7] Theorem 7.4.1] that 

Hence, for all A > |7, Theorem 7.4.2], 

P[n < E[n] - 2\VN] < /2 (3) 

and thus n is concentrated around its mean. To determine the 
tail behavior of the distribution of n, we need to find the 
expected number of runs E [n] of an i.i.d. sequence x. Note 
that n is a simple random variable, 

JV 



where Ii is indicator function of the event that a run starts. 
More precisely, Ii = I iff a run starts at position i of x, and 
Ii = 0, otherwise. 

For i = 1, we have 7i = 1 and for 2 < i < TV, we have 



p[/i = i] = 53p (i-p a ) = i- 



Erf- 



Hence, 

E 



[n]=j^E[h] = l + {N-l) (l-Erf)- 



Assuming all symbol probabilities are smaller than 1/2 implies 
(1 - Y.aeAPl) > 1 / 2 and thus S N > N / 2 - Thus > under this 
assumption, form Q with A = y/2 In TV, the lemma follows. 

■ 

The second property, stated in the following lemma, con- 
cerns patterns with same probability. The lemma that follows 
provides the means for analyzing alternating sequences using 
the techniques developed for i.i.d sequences. 

Lemma 3. Let tp — tpitp 2 '"ipn an d V"' = ' ' ' be two 
alternating patterns with profile Tp such that the multiplicity of 
tp n is equal to the multiplicity of tp' n . For any i.i.d. distribution 
p, we have p(tp) — p(tp ). 

Proof: Let the alphabet be A — {a%, ag, • • • } and let the 
probability of be pi. Assume %jj n = a and ip' n = b, with 
a, b € [k], where k is the number of elements appearing in 
ijj. Let the multiplicity of i £ [k] in tp be denoted by fa and 
the multiplicity of i in tp be denoted by /4- Note that, by 
assumption, fi a = fi' b . 

Furthermore, let / be a bijection between the set 
{1,2, • • • ,k} and a subset A C A of size fc. This bijection 
basically determines what symbol of the alphabet goes to what 
symbol in the pattern. Then the probability assigned to pattern 
tp is 



E 



i=2 
(P/(a)) 



■P/(a)) MO_1 ie[fe]\{a} 



n 



where the summation is over all bijections between the set 
{1, 2, • • • , k} and a subset A C „4 of size k. There exists a 
permutation g over [As] with gib) — a and, g(j) = i for all j £ 
[k]\{b}, such that m = fjlj. Then, by letting /'(■) = f(g(-)), 
we have 



(! -*>/(«)) ' 



r n 

lG [fc]\{a} 



1 



i n 



P/(0 



By summing both sides of the equality over all bijections 
between the set {1, 2, • • • , k\ and a subset A £ A of size 
k, it follows that p (tp) — p ( ip 



IV. Block Estimators for Alternating Sequences 

A. Upper Bound on Redundancy of Alternating Patterns 

We start by deriving an upper bound on the worst case 
redundancy of alternating sequences, defined, with respect to 
a collection of distributions V, as 



A p(ti) 

R(P) = mfsup sup log 



(4) 



where U(p) is the support set of the distribution p, q(u) is 
the probability assigned to u by the estimator q, and p(u) 
is the probability of u with respect to the distribution p. 
As already pointed out, alternating sequences are first-order 
Markov processes. Prior results by Dhulipala and Orlitsky [6] 
showed that in general, the per-symbol pattern redundancy 
of a first-order Markov process may be unbounded. For the 
particular case of alternating sequences, however, we show 
that the per-symbol pattern redundancy tends to zero. 

Suppose that V is a collection of distributions over patterns 
of length n and let ^(V) denote the set of patterns with 
positive probability with respect to some distribution in V. 
Denote the set of profiles of patterns in ^(V) by $ (V) and 
let *™ := *(V$) and $ n := $(V$). Also, for a pattern V™ 
of length n, let the set of all alternating patterns with same 
profile as that of tp n be denoted by ($ (tp 71 )). 

If ^f>(V) is partitioned into M classes such that any distri- 
bution in V assigns the same probability to all patterns in the 
same class, then the worst case redundancy is bounded by [8] 



R(T) < logM. 



(5) 



For the collection "P^, patterns with the same profile have 
the same probability and thus M = | <£> (T 5 ^ ) | . The cardinality 
of ^(V^P) is equal to the number of partitions of n because 
there is a one-to-one correspondence between distinct profiles 
$(x™) of i.i.d. sequences x n and partitions of n. Not every 
partition corresponds to a profile of an alternating sequence. 
For example, for ip = (0, 0, . . . , 0, 1), a partition with one 
part of size n, there is no alternating sequence v n such that 
$(u n ) = if and thus ip |$™|. It is, however, easy to see 
that every profile of an alternating sequence corresponds to a 
unique partition and hence |$" | < p(n). 



Theorem 4. The worst case redundancy of grows at most 
linearly with yfn. More precisely, 



R{K) < I Try -log el V^ + logn 

Proof: From Lemma [3] all patterns with the same profile 
and the same multiplicity of the last element, have the same 
probability. Hence, for any distribution p and any pattern ip S 

p$) < -±=- (6) 

L(ip) 

where L{ip) denote the number of patterns with the same 
profile and the same multiplicity of the last element as ip. 
Consider an estimator q that assigns probability 



to ip e We have that 



P (y) — 
sup sup — i=4 < > V L (^') 



(7) 



(8) 



< 



E E E v^')- (9) 



In the triple summation above, the index k corresponds to the 
multiplicity of the last element ip' n of ip . By definition of 
L(ip ), we have 



E V L @) ^ 1 



(10) 



and thus 



sup sup — i=(- < > 



^G<E>™ fcG[n 



1 < n 



which implies that 

R (V$) < log | # n | + log n < log p (n) + log u. 

The theorem then follows from p (n) < e"^> n 



(11) 

(12) 
pp. 

8-102]. ■ 
In (fT2l . p (n) is used as an upper bound for | $ n | . It may 
seem possible that a tighter upper bound for |<f> ,l | improves 
Theorem |U However, the following lemma shows that this is 
not the case and p (n) is sufficiently tight. 

Lemma 5. For the cardinality of $™ we have 

|$ n | = p(rc) (l- 0(V^e-^) 

Remark 6. The relevant problem of finding the number of 
partitions of n into at most k parts for k > rt -1 / 6 was studied 
by Szekeres iflOl . ifTTl . Our proof however is much simpler 
since it considers a special case where k > n/2. 



Proof: We first show that there is a one-to-one correspon- 
dence between $™ and partitions of n with no part larger than 
I ^J^J ■ Clearly, each Tp £ determines a unique partition of 
n. For a profile <p n with a part /i with size larger than [^r^J 
suppose x n is some sequence such that ip n — $ (x n ). There is 
a symbol in x n that appears fj, times, say a. We need at least 
/i — 1 other symbols to separate every two occurrences of a. 
However, this is not possible since /i + /.i — 1 > n and thus x n 
is not an alternating sequence. On the other hand, if all parts 
are of size at most [^^J , occurrences of every symbol can 
be separated by other symbols. This bijection implies that 



$™ = p 



1 



(13) 



where p (n, r) denote the number of partitions of n with 
largest part of size at most r. Furthermore, 



1 



P (n) 1 



E 



p(n 



p(n) 



The lemma then follows by using IflZl . p (n — i) /p(n) = 
(l + 0(n- 1 / 6 ))e^. ■ 

B. Lower Bound on the Redundancy of Alternating Patterns 

In subsection IIV-AI we saw that the redundancy of patterns 
of alternating sequences is O {^/n). Here, we show that it is 
bounded from below by a constant multiple of n 1 ' 3 . 

Lemma 7. Let ip = l^iTV^ ■ ■ ■ l?n/2> f or even n, and if) = 
lip\\ip2 • • • ^ A P\n/2 \ l>/ or °dd n, be an alternating pattern. For 
a function r n > 1 of n, we have 

[n/2] L«/2J / ^ Wfl 

(14) 

where Tp is the profile of the pattern Ip\ip2 • • ■ V'Ln/ 2 ]- 

Proof: Note that since if) is an alternating pattern, we 
have ifjj 7^ 1 for 1 < j < [n/2\. Consider the alphabet A = 
{a, si, • • • ,s m }, where m is the largest number appearing in 
if) minus one. Let v be a sequence with pattern ip starting with 
symbol a. Let p be a distribution defined as 



m 1 ( 2r n -l 



Ln/2j 




where (i, v) is the number of occurrences of i in v. First, 
suppose that rt is even. We then have 

n/2 

p(v) = 



> 



The position of a is fixed in v, but the symbols {si, • • • , s m } 
that appear the same number of times can be swapped with- 
out changing the pattern $ (v) of v. This can be done in 



1 wavs - Hence, there are rQ=i sequences V. SEQUENTIAL ESTIMATORS FOR ALTERNATING 

with pattern ip and probability p (v). Since (p n /2 < % we SEQUENCES 

nave n/2-i n/2 In the previous section we studied the problem of assigning 

nl . 1 -r-r ( probabilities to patterns of a certain length without prior infor- 

— 2 11 mation. In this section, we address a more practical problem. 

Namely, given a pattern t/j n ^ 1 of length n — 1, what is our best 

Th us > estimate q (i/'nlV'™ -1 ) of the probability of ip n being the next 

/n/2 \ observed symbol, for n e {1, 2, • • • , 1 + maxj< ra _i ^,}. 

sup p (ip) > — I JJ^ 99^! J p(t>) This sequential estimator also assigns probabilities to patterns 

P<EV * \m=i / of length n in a natural way. That is, 

/2 n/2 / ,, \ » 

n^ ! U72 • g(r)=n?(^# i - i ) ! 

Next suppose n is odd. Then, where V° is an empty string. 

[n/2] . . We present a sequential estimator (fr^ for patterns of 

V (w) = Pa ( "^ t ' J ^ a J alternating sequences which is based on a sequential estimator 

„•_-, 1 — Pwi-i / for patterns of i.i.d. sequences presented in 181 bv Orlitskv et 




for patterns of i.i.d. sequences presented in [8| by Orlitsky et 
al. 



1 , v L«/2J L«/2J 

> t { Pa ) TT „ Let 

" 2 V 1 " Pa J f = \ 3 pipn (r) : = sup p (V") 



P6V5 



1 (2r n - 1 



|n/2J L«/2J / v W(J 

n[ tl I be the largest probability assigned to ip n by any distribution 



2 V 2r„ / . \ [n/2\ J m y» an( j i e t ^ b e as defined in (0, i.e.. 



where the inequality follows since p a > 1/2. The number of q(ip n ) — V^W 1 ") 

sequences with pattern ip and starting with a is nL^i Pi*" * X/^g*™ V j 

Theorem 8. For ?/ze collection 0/ distributions, for which we have 



A OS) > ^73 log (^) ^ 3 (1 + o (1)) . < n exp Uyl^) d7) 



Proof: From Starkov's sum O we have that 
/ \ 

J2 E su $ p 



For an alternating pattern ip 1 , let 

be the set of alternating patterns of length n > i whose first i 
where % is the set of all patterns with profile Tp. elements are the same as ^ Accordingly, from g(z),ze 

Suppose first that n is even. Let $ n be the set of profiles we define ±e distribution 
whose largest parts are of size k. Since 3?™/ 2 C $ (V$), g™ := q (jj 

/ \ 

(V5) > log [ J] 51 p S e u v P p © over *< for i < n. 

\^ e *"/2 P * / We define the estimator q™ /2 such that <j" /2 (1) := 1 and 

>l°g E E BU PP (^) tf/2 := , 

?e*^(i,?)=n/2 peV * / V ^ ; 

) > 7 for i < n. Note that g™ /2 (^ n ) = g™ and thus, by ([171), 

/ \ ^ .r-^ , ,— . . \ we have the following theorem. 

= E E su PP (* A (^)) , 

\ipG$' 1 / 2 ^e*^ y Theorem 9. 77ie redundancy of q^ 2 at time n is sub-linear 

. in n. Namely, 
where iff a (V 7 ) is me alternating pattern 

(I, + 1, 1>2 + 1, • • ■ , 1, V-„/2 + 1) obtained from ^ f q n v n\ = sup bg fa ) 

-0. v ' v~e*" 1i/2vP n ) 

From (fl4l l. we obtain ( fl5l ) in which where (a) and (b) follow ( [2 \ 

from Lemma 3 in [8| and the proof of Theorem 13 in |8l, < I irJ — loge I y/n + logn- 

respectively. The proof for odd n is similar. ■ V / 




Note that q™, 2 has the drawback that it is applicable only 
to patterns with predetermined length n. Such estimators are 
called horizon-dependent IfOl ; assigned probabilities depend 
on the "horizon" n. Thus q™, 2 cannot be used to sequentially 
estimate the probabilities for a pattern whose length is un- 
known. 

However, it is easy to remove this restriction using the so- 
called "doubling trick" where time is divided into periods each 
twice as long as its predecessors. That is, the horizon hi at 
time i is considered to be 2 r iog *1 , the smallest power of two 
which is at least as large i. The estimator 31/2, where 

q hi (#) 



91/2 (1) 



1. 



9l/2 (i>% 



i-1 



qhi (^i-i) ' 

is thus horizon-independent and the following theorem holds 
uniformly over time. 

Theorem 10. The worst case redundancy of the sequential 
estimator q 1 / 2 is bounded by 

Proof: By definition, 

R{K,<li/2) < m a? log ■ 



Write 



91/2 m 



(V>r) 91/2W) 



From Lemmas QT| and [T2] we obtain 



91/2 tyf) 



32- v^, 



The theorem follows after some minor algebra and by noting 
that h„ < 2n. ■ 



Lemma 11. For an alternating pattern ip™, we have 

Proof: For any j'.j'.d. induced distribution p over alternat- 
ing patterns, and for t > n, note that 

J2 p{z)=pW). 

Hence, 

ze*'"» (V" ) 
< E M*)- 

Using ( fTTI ). we can write 

E fc® 



< h n exp I Tr\l-\/h„ 



E «w 



(18) 

(19) 
(20) 

(21) 



Furthermore, observe that 

E q(z) = q K m. 

From ( fT8l , (f20b . and (|2~T1 i. we obtain the desired result. ■ 

Lemma 12. For n > 2 ana" an alternating pattern ip™, we 
have 



1 k " < i,(log/»„-l)/2 



9i/ 2 (^)-'"" exp rv3V2-i 



2 VftI 



Proof: We show inductively that 



91/2 m) 



3^2-1 



for 2* < n < 2 i+1 and an alternating pattern ip™. This shall 
prove the lemma since h n = 2 l+1 . 

From the definiton of qi/2, it follows that 



W) r 1 Or) _ <z 2i+1 (v,r) 9 2, Or) 



91/2W) 5i/ 2 (V?) <7 21 W) 91/2 (Vf) 
As the induction hypothesis, we have that 

9 21 00 



?i/2 ^~ exp ^V3^2-iy 

All patterns in L 0? ) nave me same assigned probability 
q 2 ' +1 Or) - Hence, 



(22) 



(23) 



E « 2 



(?) = 



2,+1 00 



On the other hand, for all patterns tp £ L 0*") 
^ 2 + are disjoint. This implies that 

E « 2<+1 W= E E *(*) 

< E < L 

Thus, we obtain 

1 



the sets 



/ +l Or) 

From ( fT6l ), we have 



< 



9 



'(*r) 



2 i exp O^/iV^) |i Of)| 
From ( 124b and d25l ), we find 

9 2i+l Or) 



(24) 



(25) 



< 2*exp I tt A /-V2* 



9 2 ' (V>f 

This inequality, along with d22l and d23l . complete the proof. 

■ 

VI. Estimating Distribution of Source 

In this section, we explain how to reconstruct the noiseless 
source probabilities from estimates of probabilities provided 
by alternating sequences. First, recall from Lemma|2] that with 
high probability, n is of the same order as N and thus, with 
high probability, the length n of the alternating sequence is 
large if the length of the source sequence is large. 

Assume that the source has alphabet A — {ai, 0,2, ■ ■ ■ } with 
probability p aj for element ctj. Suppose p aiaj is the probability 



of observing dj after <ij in the alternating sequence and assume 
that the correct values of p ai a 2 ar, d Pa 2 a 1 are given. We have 



Pa±a 2 



Pa 2 



Pa 2 a ± — 



Pai 



1 - P ai 1 - Pa 2 

which implies that p ai and p a2 can be found by 

1 — Pa t a 2 



Pax — Pa 2 a x 
Pa 2 Pa\a 2 



1 Pa\a 2 Pa 2 a\ 

' — Pa 2 a\ 
1 — Pa\a 2 Pa 2 a\ 



Given p aiaj for j > 2, the remaining probabilities may be 
obtained by noting that P ° 1 " J = and thus 

Pa±a 2 Pa 2 

Paiaj 



Paj — Pa 2 



Pa\a 2 



gives the probabilities p a . for j > 3. 

Although as with any estimator, the estimators presented 
here for the alternating sequence do not find probabilities 
with zero error, we are justified in assuming that the estimates 
obtained from these estimators are "close" to the correct values 
since their redundancy is vanishing. Hence, the estimates of 
the probabilities p ai a of the alternating sequence can be used 
to obtain estimates for probabilities pi of the source sequence 
as explained above. 
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